AITopics | Concord

Collaborating Authors

Concord

Evaluating Long-Context Reasoning in LLM-Based WebAgents

Chung, Andy, Zhang, Yichi, Lin, Kaixiang, Rawal, Aditya, Gao, Qiaozi, Chai, Joyce

arXiv.org Artificial IntelligenceDec-5-2025

As large language model (LLM)-based agents become increasingly integrated into daily digital interactions, their ability to reason across long interaction histories becomes crucial for providing personalized and contextually aware assistance. However, the performance of these agents in long context scenarios, particularly for action-taking WebAgents operating in realistic web environments, remains largely unexplored. This paper introduces a benchmark for evaluating long context reasoning capabilities of WebAgents through sequentially dependent subtasks that require retrieval and application of information from extended interaction histories. We develop a novel evaluation framework that simulates multi-session user interactions by injecting irrelevant task trajectories between dependent subtasks, creating contexts ranging from 25,000 to 150,000 tokens. Through extensive evaluation of four popular models, Claude-3.7, GPT-4.1, Llama 4, and o4-mini, we observe a dramatic performance degradation as context length increases, with success rates dropping from 40-50\% in baseline conditions to less than 10\% in long context scenarios. Our detailed error analysis reveals that agents primarily fail due to getting stuck in loops and losing track of original task objectives. We further propose an implicit RAG approach that provides modest improvements by generating task-relevant summaries, though fundamental limitations in long context reasoning persist. These findings highlight critical challenges for deploying WebAgents in realistic, long-term user interaction scenarios and provide insights for developing more robust agent architectures capable of maintaining coherent task execution across extended contexts.

information, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2512.04307

Country:

North America > The Bahamas (0.14)
North America > United States > New York (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(11 more...)

Genre:

Workflow (0.93)
Research Report > New Finding (0.93)

Industry:

Media (1.00)
Consumer Products & Services (1.00)
Transportation (0.93)
Leisure & Entertainment > Sports > Basketball (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

PIFON-EPT: MR-Based Electrical Property Tomography Using Physics-Informed Fourier Networks

Yu, Xinling, Serrallés, José E. C., Giannakopoulos, Ilias I., Liu, Ziyue, Daniel, Luca, Lattanzi, Riccardo, Zhang, Zheng

arXiv.org Artificial IntelligenceDec-20-2023

We propose Physics-Informed Fourier Networks for Electrical Properties (EP) Tomography (PIFON-EPT), a novel deep learning-based method for EP reconstruction using noisy and/or incomplete magnetic resonance (MR) measurements. Our approach leverages the Helmholtz equation to constrain two networks, responsible for the denoising and completion of the transmit fields, and the estimation of the object's EP, respectively. We embed a random Fourier features mapping into our networks to enable efficient learning of high-frequency details encoded in the transmit fields. We demonstrated the efficacy of PIFON-EPT through several simulated experiments at 3 and 7 tesla (T) MR imaging, and showed that our method can reconstruct physically consistent EP and transmit fields. Specifically, when only $20\%$ of the noisy measured fields were used as inputs, PIFON-EPT reconstructed the EP of a phantom with $\leq 5\%$ error, and denoised and completed the measurements with $\leq 1\%$ error. Additionally, we adapted PIFON-EPT to solve the generalized Helmholtz equation that accounts for gradients of EP between inhomogeneities. This yielded improved results at interfaces between different materials without explicit knowledge of boundary conditions. PIFON-EPT is the first method that can simultaneously reconstruct EP and transmit fields from incomplete noisy MR measurements, providing new opportunities for EPT research.

equation, neural network, pifon-ept, (14 more...)

arXiv.org Artificial Intelligence

2302.11883

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Contra Costa County > Concord (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

Gur, Izzeddin, Furuta, Hiroki, Huang, Austin, Safdari, Mustafa, Matsuo, Yutaka, Eck, Douglas, Faust, Aleksandra

arXiv.org Artificial IntelligenceOct-2-2023

Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web automation. However, the performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML. We introduce WebAgent, an LLM-driven agent that learns from self-experience to complete tasks on real websites following natural language instructions. WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via Python programs generated from those. We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization. We empirically demonstrate that our modular recipe improves the success on real websites by over 50%, and that HTML-T5 is the best model to solve various HTML understanding tasks; achieving 18.7% higher success rate than the prior method on MiniWoB web automation benchmark, and SoTA performance on Mind2Web, an offline task planning evaluation.

arxiv preprint arxiv, language model, website, (12 more...)

arXiv.org Artificial Intelligence

2307.12856

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(22 more...)

Genre: Research Report (0.63)

Industry:

Leisure & Entertainment (0.46)
Information Technology (0.46)
Banking & Finance > Real Estate (0.33)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Creative AI, FinOps among hot developer trends of 2023

#artificialintelligenceDec-29-2022, 16:02:11 GMT

A handful of important trends will transform the software developer experience in 2023, as enterprises consider more self-hosting, observe more SaaS consolidations and see an upswing of interest in creative AI. Also, as AI enters the creativity realm, it threatens to upend the future of app dev. And OpenAI's Chat GPT, released in November, takes code completion beyond line suggestions -- in addition to writing complete web pages and simple applications, it can generate new programming languages. For developers, the 2022 job market started strong, but by December, they saw storm clouds as layoffs hit the tech sector. Experts felt vibes of the early 2000s recession and the pandemic's early days.

application, finop, zazueta, (12 more...)

#artificialintelligence

Country: North America > United States > California > Contra Costa County > Concord (0.05)

Industry: Information Technology > Software (0.77)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.92)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Lyft Opens Testing Facility for Self-Driving Cars, Adds Chrysler Minivans Digital Trends

#artificialintelligenceNov-9-2019, 22:52:20 GMT

Lyft is planning a significant expansion of its autonomous car testing program. The company is opening a new testing facility, adding vehicles to its fleet, and racking up more test miles. Like rival Uber, Lyft believes self-driving cars are the future of ridesharing. Lyft's self-driving cars are now driving four times as many miles per quarter in autonomous mode as they were six months ago, Luc Vincent, Lyft's executive vice president of autonomous driving, wrote in a blog post. The company currently gives rides in test vehicles to employees, and the number of routes where these rides are available has tripled in the past year, Vincent wrote.

chrysler minivan digital trend, lyft, vehicle, (8 more...)

#artificialintelligence

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.06)
North America > United States > California > Santa Clara County > Palo Alto (0.06)
North America > United States > California > San Mateo County > East Palo Alto (0.06)
(3 more...)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)

Add feedback

High School Sophomore Arrested For Hacking Computer System, Changing Grades Of Other Students

International Business TimesMay-15-2018, 21:59:07 GMT

A Northern California teen was arrested Wednesday for hacking a school district's computer system and changing the grades of up to 15 students. Authorities said they arrested David Rotaro, a sophomore at Ygnacio Valley High School in Concord, California, for infiltrating the school district's computer system. Rotaro, 16, said it was like "stealing candy from a baby," according to KGO-TV, an ABC affiliate in San Francisco. It took him five minutes to design a "phishing email," that he sent out to swipe login information from school faculty. Authorities didn't release Rotaro's name, however, he confessed to having committed the crime during an interview with KGO-TV.

artificial intelligence, computer system, rotaro, (9 more...)

International Business Times

Country:

North America > United States > California > San Francisco County > San Francisco (0.27)
North America > United States > California > Contra Costa County > Concord (0.27)
Europe > France > Île-de-France > Paris > Paris (0.07)

Genre: Personal > Interview (0.59)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (0.81)
Education > Educational Setting > K-12 Education > Secondary School (0.65)

Technology:

Information Technology > Artificial Intelligence (0.91)
Information Technology > Security & Privacy (0.81)

Add feedback

Where are self-driving cars being tested?

FOX NewsMar-19-2018, 22:55:51 GMT

An Arizona woman was killed after being struck by a self-driving Uber vehicle, an incident believed to be the first of its kind. But Uber is not the only company that has experienced accidents with driverless cars. Companies like Google, Tesla and General Motors also join the list. An Arizona woman was killed after being struck by a self-driving Uber vehicle this week - prompting the company to suspend all testing of self-driving vehicles in cities across the country. The Uber was in autonomous mode at the time of the collision in Tempe, and there was a vehicle operator behind the wheel, police said.

artificial intelligence, self-driving car, vehicle, (12 more...)

FOX News

Country:

North America > United States > Arizona (0.54)
North America > United States > California > San Francisco County > San Francisco (0.09)
North America > United States > Nevada > Clark County > Las Vegas (0.07)
(5 more...)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)

Add feedback

Uber's Robo-Truck, McLaren's Senna Supercar, and More Cars News This Week

WIREDMar-9-2018, 14:13:15 GMT

If the phrase "autonomous vehicle" makes you think of some four-wheeled pod tootling around the city, you need to think bigger. For all the talk of robo-taxis, the smart money says that when this tech comes for our roads, it'll start on the highway. And if you're looking for proof, grab your sunglasses, a trucker hat, and a ticket to Arizona or Florida--the testing grounds of choice for the companies teaching trucks to drive themselves. This week, we have news of Uber testing in the Copper State and startup Starsky Robotics sending a truck down a Florida highway, all by itself. Meanwhile, the titans of the auto industry met at the Geneva Motor Show, where the talk centered on supercars--and how to take down Elon Musk.

artificial intelligence, senna supercar, uber, (12 more...)

WIRED

Country:

North America > United States > Arizona (0.26)
Europe > Switzerland (0.06)
North America > United States > District of Columbia > Washington (0.05)
(3 more...)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Automobiles & Trucks > Manufacturer (1.00)
Government > Regional Government > North America Government > United States Government (0.31)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)

Add feedback

Real-Time Energy Disaggregation of a Distribution Feeder's Demand Using Online Learning

Ledva, Gregory S., Balzano, Laura, Mathieu, Johanna L.

arXiv.org Machine LearningFeb-26-2018

Though distribution system operators have been adding more sensors to their networks, they still often lack an accurate real-time picture of the behavior of distributed energy resources such as demand responsive electric loads and residential solar generation. Such information could improve system reliability, economic efficiency, and environmental impact. Rather than installing additional, costly sensing and communication infrastructure to obtain additional real-time information, it may be possible to use existing sensing capabilities and leverage knowledge about the system to reduce the need for new infrastructure. In this paper, we disaggregate a distribution feeder's demand measurements into: 1) the demand of a population of air conditioners, and 2) the demand of the remaining loads connected to the feeder. We use an online learning algorithm, Dynamic Fixed Share (DFS), that uses the real-time distribution feeder measurements as well as models generated from historical building- and device-level data. We develop two implementations of the algorithm and conduct case studies using real demand data from households and commercial buildings to investigate the effectiveness of the algorithm. The case studies demonstrate that DFS can effectively perform online disaggregation and the choice and construction of models included in the algorithm affects its accuracy, which is comparable to that of a set of Kalman filters.

artificial intelligence, machine learning, real time system, (17 more...)

arXiv.org Machine Learning

1701.04389

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Hawaii (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Energy > Power Industry (1.00)
Energy > Renewable > Solar (0.88)
Education > Educational Setting > Online (0.61)

Technology:

Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.61)

Add feedback

WWII bombers once built on new Michigan driverless car test site

USATODAY - Tech Top StoriesMar-13-2017, 02:10:11 GMT

The ex-bomber plant and home of Rosie the Riveter will transform this year into an autonomous vehicle technology test site. It once housed one of the largest factories in the world, pumping out B24 bombers to help America and her allies win World War II, and later transmissions when it was owned by General Motors. It once housed one of the largest factories in the world, pumping out B24 bombers to help America and her allies win World War II, and later transmissions when it was owned by General Motors. The former Willow Run bomber plant in Ypsilanti Township is mostly a memory now, demolished following GM's 2009 bankruptcy, except for a piece that houses the Yankee Air Museum. Land at the former 335-acre Willow Run site in Ypsilanti Township where the American Center for Mobility is located on in January 2017 that will be used for testing autonomous vehicles.

artificial intelligence, maddox, vehicle, (14 more...)

USATODAY - Tech Top Stories

Country:

North America > United States > North Carolina (0.05)
North America > United States > Iowa (0.05)
North America > United States > California > San Diego County > San Diego (0.05)
(9 more...)

Industry:

Transportation > Passenger (1.00)
Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)

Add feedback